SOFTCARDINALITY-CORE: Improving Text Overlap with Distributional Measures for Semantic Textual Similarity
نویسندگان
چکیده
The soft cardinality proved to be a very strong text-overlapping baseline for the task of semantic-textual-similarity (STS) obtaining the third place in SemEval-2012. This year, besides to the plain text-overlapping approach, two distributional word-similarity functions derived from the ukWack corpus were tested within the soft cardinality. These measures contributed to improve the performance of the text-overlapping approach. Further, these were combined with other features using regression obtaining positions 18th, 22th and 23th among the 90 participants systems in the official 2013 shared task ranking at *SEM. After the release of the gold standard anotations of the test data, we observed that the bare similarity measures, without the use of regression, would have obtained positions 6th, 7th and 8th. Moreover, the simple arithmetic average of these similarity measures would have been 4th (mean=0.5747). This paper describes the submitted system and the similarity measures that would obtained those better results.
منابع مشابه
DLS$@$CU-CORE: A Simple Machine Learning Model of Semantic Textual Similarity
We present a system submitted in the Semantic Textual Similarity (STS) task at the Second Joint Conference on Lexical and Computational Semantics (*SEM 2013). Given two short text fragments, the goal of the system is to determine their semantic similarity. Our system makes use of three different measures of text similarity: word n-gram overlap, character n-gram overlap and semantic overlap. Usi...
متن کاملDistributional semantic models for detection of textual entailment
We present our experiments on integrating and evaluating distributional semantics with the recognising textual entailment task (RTE). We consider entailment as semantic similarity between text and hypothesis coupled with additional heuristic, which can be either selecting the top scoring hypothesis or a pre-defined threshold. We show that a distributional model is particularly good at detecting...
متن کاملRobust semantic text similarity using LSA, machine learning, and linguistic resources
Semantic textual similarity is a measure of the degree of semantic equivalence between two pieces of text. We describe the SemSim system and its performance in the *SEM 2013 and SemEval-2014 tasks on semantic textual similarity. At the core of our system lies a robust distributional word similarity component that combines Latent Semantic Analysis and machine learning augmented with data from se...
متن کاملLIPN-CORE: Semantic Text Similarity using n-grams, WordNet, Syntactic Analysis, ESA and Information Retrieval based Features
This paper describes the system used by the LIPN team in the Semantic Textual Similarity task at SemEval 2013. It uses a support vector regression model, combining different text similarity measures that constitute the features. These measures include simple distances like Levenshtein edit distance, cosine, Named Entities overlap and more complex distances like Explicit Semantic Analysis, WordN...
متن کاملDistributional Semantic Models for Clinical Text Applied to Health Record Summarization
As information systems in the health sector are becoming increasingly computerized, large amounts of care-related information are being stored electronically. In hospitals clinicians continuously document treatment and care given to patients in electronic health record (EHR) systems. Much of the information being documented is in the form of clinical notes, or narratives, containing primarily u...
متن کامل